17 research outputs found

    Image Aesthetics Assessment Using Composite Features from off-the-Shelf Deep Models

    Full text link
    Deep convolutional neural networks have recently achieved great success on image aesthetics assessment task. In this paper, we propose an efficient method which takes the global, local and scene-aware information of images into consideration and exploits the composite features extracted from corresponding pretrained deep learning models to classify the derived features with support vector machine. Contrary to popular methods that require fine-tuning or training a new model from scratch, our training-free method directly takes the deep features generated by off-the-shelf models for image classification and scene recognition. Also, we analyzed the factors that could influence the performance from two aspects: the architecture of the deep neural network and the contribution of local and scene-aware information. It turns out that deep residual network could produce more aesthetics-aware image representation and composite features lead to the improvement of overall performance. Experiments on common large-scale aesthetics assessment benchmarks demonstrate that our method outperforms the state-of-the-art results in photo aesthetics assessment.Comment: Accepted by ICIP 201

    Association between admission-blood-glucose-to-albumin ratio and clinical outcomes in patients with ST-elevation myocardial infarction undergoing percutaneous coronary intervention

    Get PDF
    IntroductionIt is unclear whether admission-blood-glucose-to-albumin ratio (AAR) predicts adverse clinical outcomes in patients with ST-segment elevation myocardial infarction (STEMI) who are treated with percutaneous coronary intervention (PCI). Here, we performed a observational study to explore the predictive value of AAR on clinical outcomes.MethodsPatients diagnosed with STEMI who underwent PCI between January 2010 and February 2020 were enrolled in the study. The patients were classified into three groups according to AAR tertile. The primary outcome was in-hospital all-cause mortality, and the secondary outcomes were in-hospital major adverse cardiac events (MACEs), as well as all-cause mortality and MACEs during follow-up. Logistic regression, Kaplan–Meier analysis, and Cox proportional hazard regression were the primary analyses used to estimate outcomes.ResultsAmong the 3,224 enrolled patients, there were 130 cases of in-hospital all-cause mortality (3.9%) and 181 patients (5.4%) experienced MACEs. After adjustment for covariates, multivariate analysis demonstrated that an increase in AAR was associated with an increased risk of in-hospital all-cause mortality [adjusted odds ratio (OR): 2.72, 95% CI: 1.47–5.03, P = 0.001] and MACEs (adjusted OR: 1.91, 95% CI: 1.18–3.10, P = 0.009), as well as long-term all-cause mortality [adjusted hazard ratio (HR): 1.64, 95% CI: 1.19–2.28, P = 0.003] and MACEs (adjusted HR: 1.58, 95% CI: 1.16–2.14, P = 0.003). Receiver operating characteristic (ROC) curve analysis indicated that AAR was an accurate predictor of in-hospital all-cause mortality (AUC = 0.718, 95% CI: 0.675–0.761) and MACEs (AUC = 0.672, 95% CI: 0.631–0.712).DiscussionAAR is a novel and convenient independent predictor of all-cause mortality and MACEs, both in-hospital and long-term, for STEMI patients receiving PCI

    Bidirectional Temporal-Recurrent Propagation Networks for Video Super-Resolution

    No full text
    Recently, convolutional neural networks have made a remarkable performance for video super-resolution. However, how to exploit the spatial and temporal information of video efficiently and effectively remains challenging. In this work, we design a bidirectional temporal-recurrent propagation unit. The bidirectional temporal-recurrent propagation unit makes it possible to flow temporal information in an RNN-like manner from frame to frame, which avoids complex motion estimation modeling and motion compensation. To better fuse the information of the two temporal-recurrent propagation units, we use channel attention mechanisms. Additionally, we recommend a progressive up-sampling method instead of one-step up-sampling. We find that progressive up-sampling gets better experimental results than one-stage up-sampling. Extensive experiments show that our algorithm outperforms several recent state-of-the-art video super-resolution (VSR) methods with a smaller model size

    Part-Wise Adaptive Topology Graph Convolutional Network for Skeleton-Based Action Recognition

    No full text
    Human action recognition is a computer vision challenge that involves identifying and classifying human movements and activities. The behavior of humans comprises movements of multiple body parts, and Graph Convolutional Networks (GCNs) have emerged as a promising approach for this task. However, most contemporary GCN methods perform graph convolution on the entire skeleton graph without considering that the human body consists of distinct body parts. To address these shortcomings, we propose a novel method that optimizes the representation of the skeleton graph by designing temporal and spatial convolutional blocks while introducing the Part-wise Adaptive Topology Graph Convolution (PAT-GC) technique. PAT-GC adaptively learns the segmentation of different body parts and dynamically integrates the spatial relevance between them. Furthermore, we utilize hierarchical modeling to divide the skeleton graph, capturing a more comprehensive representation of the human body. We evaluate our approach on three publicly available large datasets: NTU RGB + D 60, NTU RGB + D 120, and Kinetics Skeleton 400. Our experimental results demonstrate that our approach achieves state-of-the-art performance, thus validating the efficiency of our proposed technique for human action recognition

    A Novel Coarse-to-Fine Method of Ship Detection in Optical Remote Sensing Images Based on a Deep Residual Dense Network

    No full text
    Automatic ship detection in optical remote sensing images is of great significance due to its broad applications in maritime security and fishery control. Most ship detection algorithms utilize a single-band image to design low-level and hand-crafted features, which are easily influenced by interference like clouds and strong waves and not robust for large-scale variation of ships. In this paper, we propose a novel coarse-to-fine ship detection method based on discrete wavelet transform (DWT) and a deep residual dense network (DRDN) to address these problems. First, multi-spectral images are adopted for sea-land segmentation, and an enhanced DWT is employed to quickly extract ship candidate regions with missing alarms as low as possible. Second, panchromatic images with clear spatial details are used for ship classification. Specifically, we propose the local residual dense block (LRDB) to fully extract semantic feature via local residual connection and densely connected convolutional layers. DRDN mainly consists of four LRDBs and is designed to further remove false alarms. Furthermore, we exploit the multiclass classification strategy, which can overcome the large intra-class difference of targets and identify ships of different sizes. Extensive experiments demonstrate that the proposed method has high robustness in complex image backgrounds and achieves higher detection accuracy than other state-of-the-art methods

    Unsupervised Adversarial Defense through Tandem Deep Image Priors

    No full text
    Deep neural networks are vulnerable to the adversarial example synthesized by adding imperceptible perturbations to the original image but can fool the classifier to provide wrong prediction outputs. This paper proposes an image restoration approach which provides a strong defense mechanism to provide robustness against adversarial attacks. We show that the unsupervised image restoration framework, deep image prior, can effectively eliminate the influence of adversarial perturbations. The proposed method uses multiple deep image prior networks called tandem deep image priors to recover the original image from adversarial example. Tandem deep image priors contain two deep image prior networks. The first network captures the main information of images and the second network recovers original image based on the prior information provided by the first network. The proposed method reduces the number of iterations originally required by deep image prior network and does not require adjusting the classifier or pre-training. It can be combined with other defensive methods. Our experiments show that the proposed method surprisingly achieves higher classification accuracy on ImageNet against a wide variety of adversarial attacks than previous state-of-the-art defense methods

    Inter-Level Feature Balanced Fusion Network for Street Scene Segmentation

    No full text
    Semantic segmentation, as a pixel-level recognition task, has been widely used in a variety of practical scenes. Most of the existing methods try to improve the performance of the network by fusing the information of high and low layers. This kind of simple concatenation or element-wise addition will lead to the problem of unbalanced fusion and low utilization of inter-level features. To solve this problem, we propose the Inter-Level Feature Balanced Fusion Network (IFBFNet) to guide the inter-level feature fusion towards a more balanced and effective direction. Our overall network architecture is based on the encoder–decoder architecture. In the encoder, we use a relatively deep convolution network to extract rich semantic information. In the decoder, skip-connections are added to connect and fuse low-level spatial features to restore a clearer boundary expression gradually. We add an inter-level feature balanced fusion module to each skip connection. Additionally, to better capture the boundary information, we added a shallower spatial information stream to supplement more spatial information details. Experiments have proved the effectiveness of our module. Our IFBFNet achieved a competitive performance on the Cityscapes dataset with only finely annotated data used for training and has been greatly improved on the baseline network

    Blind Image Quality Assessment Based on Classification Guidance and Feature Aggregation

    No full text
    In this work, we present a convolutional neural network (CNN) named CGFA-CNN for blind image quality assessment (BIQA). A unique two-stage strategy is utilized which firstly identifies the distortion type in an image using Sub-Network I and then quantifies this distortion using Sub-Network II. Different from most deep neural networks, we extract hierarchical features as descriptors to enhance the image representation and design a feature aggregation layer in an end-to-end training manner applying Fisher encoding to visual vocabularies modeled by Gaussian mixture models (GMMs). Considering the authentic distortions and synthetic distortions, the hierarchical feature contains the characteristics of a CNN trained on the self-built dataset and a CNN trained on ImageNet. We evaluated our algorithm on four publicly available databases, and the results demonstrate that our CGFA-CNN has superior performance over other methods both on synthetic and authentic databases

    Swin-MFA: A Multi-Modal Fusion Attention Network Based on Swin-Transformer for Low-Light Image Human Segmentation

    No full text
    In recent years, image segmentation based on deep learning has been widely used in medical imaging, automatic driving, monitoring and security. In the fields of monitoring and security, the specific location of a person is detected by image segmentation, and it is segmented from the background to analyze the specific actions of the person. However, in low-illumination conditions, it is a great challenge to the traditional image-segmentation algorithms. Unfortunately, a scene with low light or even no light at night is often encountered in monitoring and security. Given this background, this paper proposes a multi-modal fusion network based on the encoder and decoder structure. The encoder, which contains a two-branch swin-transformer backbone instead of the traditional convolutional neural network, fuses the RGB and depth features with a multiscale fusion attention block. The decoder is also made up of the swin-transformer backbone and is finally connected via the encoder with several residual connections, which are proven to be beneficial in improving the accuracy of the network. Furthermore, this paper first proposes the low light–human segmentation (LLHS) dataset of portrait segmentation, with aligned depth and RGB images with fine annotation under low illuminance, by combining the traditional monocular camera and a depth camera with active structured light. The network is also tested in different levels of illumination. Experimental results show that the proposed network has good robustness in the scene of human segmentation in a low-light environment with varying illumination. The mean Intersection over Union (mIoU), which is often used to evaluate the performance of image segmentation model, of the Swin-MFA in the LLHS dataset is 81.0, is better than those of ACNet, 3DGNN, ESANet, RedNet and RFNet at the same level of depth in a mixed multi-modal network and is far ahead of the segmentation algorithm that only uses RGB features, so it has important practical significance

    Partial Atrous Cascade R-CNN

    No full text
    Deep-learning-based segmentation methods have achieved excellent results. As two main tasks in computer vision, instance segmentation and semantic segmentation are closely related and mutually beneficial. Spatial context information from the semantic features can also improve the accuracy of instance segmentation. Inspired by this, we propose a novel instance segmentation framework named partial atrous cascade R-CNN (PAC), which effectively improves the accuracy of the segmentation boundary. The proposed network innovates in two aspects: (1) A semantic branch with a partial atrous spatial pyramid extraction (PASPE) module is proposed in this paper. The module consists of atrous convolution layers with multi-dilation rates. By expanding the receptive field of the convolutional layer, multi-scale semantic features are greatly enriched. Experiments shows that the new branch obtains more accurate segmentation contours. (2) The proposed mask quality (MQ) module scores the intersection over union (IoU) between the predicted mask and the ground truth mask. Benefiting from the modified mask quality score, the quality of the segmentation results is judged credibly. Our proposed network is trained and tested on the MS COCO dataset. Compared with the benchmark, it brings consistent and noticeable improvements in the case of using the same backbone
    corecore